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In the past 2 decades latent variable modeling has become a standard tool in the 
social sciences. In the same time period, traditional linear structural equation models 
have been extended to include non-linear interaction and quadratic effects (e.g., Klein 
and Moosbrugger, 2000), and multilevel modeling (Rabe-Hesketh et al., 2004). We 
present a general non-linear multilevel structural equation mixture model (GNM-SEMM) 
that combines recent semiparametric non-linear structural equation models (Kelava and 
Nagengast, 2012; Kelava et al., 2014) with multilevel structural equation mixture models 
(Muthen and Asparouhov, 2009) for clustered and non-normally distributed data. The 
proposed approach allows for semiparametric relationships at the within and at the 
between levels. We present examples from the educational science to illustrate different 
submodels from the general framework. 
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In the past 2 decades latent variable modeling has become a 
standard tool in the social sciences. Linear structural equation 
models have been extended to include non-linear interaction and 
quadratic effects (for a review see Schumacker and Marcoulides, 
1998; Algina and Moulder, 2001; Marsh et al, 2004, 2006), and 
for the capability to model multilevel data structures (e.g., Rabe- 
Hesketh et al., 2004; Muthen and Asparouhov, 2009). However, 
a systematic combination of both non-linear structural equa- 
tion modeling and multilevel modeling has not been imple- 
mented in a more general framework. In this article, we present 
a GNM-SEMM that combines recent semiparametric non-linear 
structural equation models (Kelava and Nagengast, 2012; Kelava 
et al., 2014) with multilevel structural equation mixture models 
(Muthen and Asparouhov, 2009) for clustered and non-Gaussian 
data. The proposed framework is capable of modeling non-linear 
parametric and semiparametric relationships at the within and at 
the between levels, and it allows non-normally distributed data to 
be considered. We will provide an empirical example from edu- 
cational sciences to illustrate the applicability of the proposed 
framework. We will begin by providing an overview of current 
approaches for estimating non-linear structural equation mod- 
els and current frameworks for multilevel structural equation 
(mixture) models. 

1. NON-LINEAR STRUCTURAL EQUATION MODELS 

Numerous parametric approaches for the estimation of non- 
linear effects have been developed (for a review, see Schumacker 
and Marcoulides, 1998; Algina and Moulder, 2001; Marsh et al., 
2004, 2006), including product indicator approaches (e.g., Kenny 
and Judd, 1984; Bollen, 1995; laccard and Wan, 1995; Ping, 1995; 
Joreskog and Yang, 1996; Algina and Moulder, 2001; Marsh et al., 
2004, 2006; Little et al, 2006; Kelava and Brandt, 2009), dis- 
tribution analytic approaches (Klein and Moosbrugger, 2000; 



Klein and Muthen, 2007), Bayesian approaches (e.g., Arminger 
and Muthen, 1998; Lee et al, 2007), and method of moments 
based approaches (Wall and Amemiya, 2003; Mooijaart and 
Bender, 2010). Whereas most product indicator approaches have 
been ad-hoc methods for the specification of non-linear interac- 
tion effects and have thus suffered from requiring complicated 
measurement models, recent distribution analytic and Bayesian 
approaches have tried to overcome the need for non-linear mea- 
surement models. Method-of-moments-based approaches (Wall 
and Amemiya, 2003; Mooijaart and Bentler, 2010) and some indi- 
cator approaches (Bollen, 1995; Joreskog and Yang, 1996) have 
been proposed as methods that do not rely as heavily on the 
normality assumption of the latent variables as other approaches 
(e.g., the distribution analytic approaches). The relaxation of dis- 
tributional assumptions may lead to a reduction in the threat of 
biased estimates for non-linear effects in situations in which data 
are non-normally distributed, but for most of these approaches, 
relaxing these assumptions is associated with a low power for 
detecting the effects (Schermelleh-Engel et al., 1998; Brandt et al., 
2014). 

A different approach for modeling non-linear relations 
between latent variables is the use of semiparametric structural 
equation mixture models (SEMM; Arminger and Stein, 1997; 
Jedidi et al., 1997a,b; Dolan and van der Maas, 1998; Arminger 
et al, 1999; Muthen, 2001; Bauer and Curran, 2004; Bauer, 
2005; Pek et al., 2009, 2011). Finite mixtures of linear structural 
equation models are used to approximate the unknown func- 
tional form of the non-linear relationship of the latent variables 1 . 



In SEMM linear models are estimated within several latent classes. Non- 
linear relationships between two variables are modeled by the parameter 
estimates for the linear effects that change in size across the (finite number 
of) latent classes. 
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Furthermore, by assuming mixtures, the SEMM approach relaxes 
the assumption of normally distributed latent variables and dis- 
turbances necessary in conventional structural equation models. 
Therefore, the SEMM approach is a flexible tool for predicting 
latent dependent variables when data are not normal, and when 
obtaining a strict parametric representation of the functional rela- 
tion does not have the highest priority (for a discussion see Bauer, 
2005). However, one drawback is that the linearity assumption of 
latent relationships and the normality assumption of the latent 
variables are relaxed simultaneously. This drawback can be man- 
ifested in the problem that observed non-normality in the data 
cannot be attributed to either non-normality of the latent vari- 
ables or non-linearity between the latent variables. A way to 
overcome this problem is the specification of non-linear struc- 
tural equation mixture models (NSEMM; Kelava et al., 2014) 
that allow distributional and linearity assumptions to be relaxed 
separately for the latent variables and their relationships. 

Although, the use of mixtures for modeling non-linear latent 
variable relationships (e.g., Curran et al, 1996; Dolan and van der 
Maas, 1998; Bauer and Curran, 2004; Bauer, 2005) or the non- 
normality of latent variables in the context of non-linear struc- 
tural equation models (Lubke and Muthen, 2005; Lee et al, 2008; 
Yang and Dunson, 2010; Kelava and Nagengast, 2012; Brandt 
et al., 2014; Kelava et al., 2014) have received increased attention 
in recent years, systematic evaluations have been rare. As an addi- 
tional limitation, all approaches presented so far have been strictly 
limited to single-level models and have not accounted for nested 
data structures. 

2. MULTILEVEL STRUCTURAL EQUATION MODELING 

Nested data structures have been addressed with multilevel mod- 
els for relationships between manifest variables (for an intro- 
duction see Snijders and Bosker, 1999; Hox, 2010). In the past 
2 decades, researchers have proposed frameworks that are capa- 
ble of modeling nested data structures in latent variable models 
(e.g., Muthen, 1994; Rabe-Hesketh et al, 2004; Muthen and 
Asparouhov, 2009). For example, these frameworks have included 
models that account for random effects on the within-level, mul- 
tilevel path analysis (Heck and Thomas, 2000), or multilevel 
confirmatory factor analysis (Muthen, 1994). Furthermore, mix- 
tures of distributions have been applied in latent growth curve 
modeling (Muthen and Asparouhov, 2009). 

So far, very limited psychometric developments have been pro- 
posed in the context of non-linear multilevel structural equation 
models that incorporate latent interaction effects. Leite and Zuo 
(2011) presented a product-indicator-based approach that allows 
for a specification of latent interactions on the between-level (e.g., 
at the school level). Their approach was a first attempt to extend 
the product-indicator approach for non-linear interaction effects 
in latent multilevel models. Products of between-level indicators 
are used for the specification of a measurement model of the 
between-level latent product variable. 

Focusing more generally on within-person processes in 
psychology (Molenaar, 2004; Molenaar and Campbell, 2009), 
Nagengast et al. (2013) adapted the unconstrained product indi- 
cator approach to account for latent interactions on the within- 
level. In predicting homework motivation, they found support 



for the latent interaction between homework expectancy and 
homework value at the within-student level. 

Despite these first successful adaptations, several problems that 
are associated with single-level non-linear structural equation 
modeling remain unsolved. First, the hitherto applied constrained 
and unconstrained product-indicator approaches for multilevel 
models are vulnerable to violations of distributional assumptions 
(normal distributions are typically assumed; for a discussion see 
Kelava et al., 2011). The specification of constrained and uncon- 
strained product-indicator approaches strongly depends on the 
distributions involved (Kelava and Brandt, 2009), and biased esti- 
mates of the parameters and standard errors can be expected 
when specification errors occur (Kelava et al., 2008) or distri- 
butional assumptions are not met (Kelava and Nagengast, 2012). 
Hence, product-indicator approaches that are extended for mul- 
tilevel data structures are even more vulnerable because more 
distributional assumptions on different levels have to be met. 

Second, the proposed extensions of single-level non-linear 
structural equation models specify a parametric non-linearity 
(by involving products of latent variables). Recently, a strong 
emphasis has been placed on the relaxation of this simple func- 
tional relationship, including mixtures of latent variables that also 
allow for non-normally distributed variables (e.g., Bauer, 2005; 
Kelava et al., 2014). Therefore, on the one hand there is a need 
for an optional specification of a semiparametric relationship of 
the latent variables (at the within and between levels) to better 
approximate the non-linear reality. On the other hand, there is a 
need for an optional specification of mixtures that can account 
for non-normality or heterogeneity across subpopulations. 

Third, the application of single-level non-linear structural 
equation modeling in substantive research has suffered from 
too many approaches that use the same distributional assump- 
tions (see paragraphs above) and too few simulation studies 
that offer clear recommendations for the application of specific 
approaches (for an overview, see Kelava et al., 2011). Approaches 
that agree with regard to distributional assumptions may lead to 
contradictory results; that is, some approaches might suggest sig- 
nificant non-linear effects, whereas others might not. Substantive 
researchers cannot solve this kind of problem by referring to 
empirical data. Further information that is based on simulation 
studies (for single-level non-linear models see e.g., Brandt et al., 
2014) is needed here. 

In total, there is a need for a framework that incorporates sev- 
eral special cases of multilevel modeling and that offers general as 
well as specific solutions for both substantive and methodological 
research in non-linear latent variable modeling. From a substan- 
tive standpoint, non-linear hypotheses (e.g., interactions) can be 
examined in more detail. From a methodological standpoint, the 
framework will foster the comparison of different kinds of esti- 
mators (e.g., MCMC, ML, or moment methods) in the context of 
different distributions. 

As a result of these considerations, in the next section, we 
will present a general non-linear multilevel structural equation 
mixture modeling (GNM-SEMM)framework that allows for the 
separate relaxation of distributional and linearity assumptions of 
the latent variables and their relationships on different levels of 
a nested data structure. We will provide several theoretical and 
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practical examples to illustrate what is possible within the frame- 
work. In general, within this framework, it is possible to derive 
specific submodels that include crucial parts of the model as well 
as a combination of several aspects that have not been combined 
before. 

3. A GENERAL NON-LINEAR MULTILEVEL STRUCTURAL 
EQUATION MIXTURE MODEL 

In this section, we will present a GNM-SEMM framework that 
allows for semiparametric latent non-linear effects on the within 
and the between levels. The framework presented here is simi- 
lar to the general multilevel mixture model and notation pre- 
sented by Muthen and Asparouhov (2009). Whereas Muthen 
and Asparouhov's (2009) model focuses only on linear rela- 
tionships, the GNM-SEMM framework accounts for non-linear 
semiparametric relationships of the manifest and latent vari- 
ables involved. This allows for a more precise modeling of latent 
variable relationships at different data levels while relaxing the lin- 
earity assumptions of standard latent multilevel frameworks (e.g., 
Rabe-Hesketh et al, 2004). 

3.1. OBSERVED AND MIXTURE VARIABLES 

3.1.1. Definition 

Let yjik be the score of the j-th (j = 1, . . . ,/) observed (indica- 
tor) variable for individual i (i = 1, . . . , Nk) in a cluster k {k = 
1, . . . , K). Note that the individual index i is cluster-specific. Its 
range depends on the cluster size Nk (e.g., the number of pupils 
in a given school k is denoted as Nk). Let zjk be the score of the 
/-th (Z = 1 , . . . , L) observed (indicator) variable for cluster k. The 
observed scores yifk and z/j.- could be realizations of dichotomous, 
ordered categorical, continuous normally distributed, or count 
variables. 

Categorical (mixture) variables are used for the definition of 
mixtures on the individual (within) and cluster (between) levels. 
Let Cik be an within-level latent categorical variable for individual 
i in cluster k, which takes values 1, . . . , C*,. Let Dk be a between- 
level latent categorical variable for cluster k, which takes values 
1, . . . , D*. Note that the number of latent classes on the within- 
level may be different across the latent classes on the between- 
level. 

Analogous to Rabe-Hesketh et al. (2004), Muthen (1984), and 
Muthen and Asparouhov (2009), for observed dichotomous and 
ordered categorical variables, the underlying normally distributed 
latent variables y* ik and z* k are defined such that for a set of thresh- 
old parameters r; sa j and xyd, and categories s and s', respectively, 
the following equations hold for each subject in cluster k: 

Yjik = s\c ik =c,D k = d tjscd < )>* lk < £/,s+l,cd (1) 
Zlk = s'\D k = d O T/s'd < Z* k < *y + \,d, (2) 

where the vertical bar -|- indicates a "conditional on" state- 
ment, and -o- indicates an equivalence. For continuous nor- 
mally distributed variables, y* ik = y^k and z* k = zik are assumed, 
and for count variables, y* jk = log(A.j;fc) and z* k = log(^ik) hold, 
where Xjik and Xik are the expectations of the Poisson distribu- 
tion. Additional assumptions regarding the mean and covariance 



structure will be made in the following subsections, which will 
specify the measurement and structural models on the within and 
between levels. 

3.1.2. Example 

Suppose that pupils from several schools take part in a math test. 
For a given pupil i from school k the score on a sub-task; from the 
math test is given by y^t- In addition, for school k, there is a score 

that indicates the school's social problems (e.g., the degree of 
bullying reported by the principal). In Figure 1, two latent cate- 
gorical variables Cft and Dk on the within-level (Level 1) and the 
between-level (Level 2), respectively, are introduced. These vari- 
ables may account for heterogeneity that occurs in the scores on 
both levels. On Level 1, heterogeneity in the distribution of the 
math test may occur due to additional private lessons in math 
that some pupils received. On Level 2, heterogeneity may occur 
in the distribution of the school's social problems, for exam- 
ple, due to the general (unobserved) socioeconomic status of the 
neighborhood where the school is located. Furthermore, school 
k might belong to an unobserved group of schools Dk = d that 
explicitly prepared for the math test. This may then influence the 
distribution of the math scores. 

Figure 1 shows a diagram with the observed and mixture 
variables. At this stage, there is no model that can explain the 
relationship between the scores yjik and z\k and no measure- 
ment model that can describe the realizations of the scores. The 
mixtures are indicated by C,k and Dk. 

3.2. LEVEL 1- WITHIN LEVEL 
3.2.1. Measurement model 

3.2.1.1. Definition. Let y* k be the /-dimensional vector for indi- 
vidual i in cluster k that includes scores for all dependent observed 
within variables. The measurement model is defined by a mixture 
distribution model 

Y*k\c ik = c,D k = d = Vikcd + A-lkcdfliliikcd) + Kifcdgl (xiat) + eukcd (3) 

where \>\kcd is a /-dimensional vector of latent intercepts, A.\k cd 
is a / x myi) loading matrix. rj ukcd = (rj Utkcd , nukmcd)' is an 
M-dimensional vector of variables including all latent exogenous 
and endogenous variables. f\ ( • ) is a smooth polynomial function 
mapping the m-dimensional variable vector r] likcd to an m^y 
dimensional vector /H*/^^). fi liked) cou ld be a vector that 
includes product variables [e.g., imiikcd, linked, r)\\ikcdn\2\kcd)' 











subject / ! Cik ) 1 


cluster k \ D k ) \ 




yjik 




z lk 





FIGURE 1 | Observed variable scores yy, k (within-level) and z\ k 
(between-level) as well as mixtures C jk (within-level) and D k 
(between-level). 
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or (tinted, (miikcd) 2 , mtted, (tinted) 2 )'] (e.g., Schumacker and 
Marcoulides, 1998; Kelava et al., 2011) or splines (Freund and 
Hoppe, 2007). Kiked is a / x Q( gl ) matrix with regression coeffi- 
cients. Xi;j- is a Q-dimensional vector of all observed unexplained 
(within) covariates that may have an additional influence on the 
indicator variables y* k . gi( ■ ) is a smooth polynomial function 
mapping the Q-dimensional vector of covariates to a Q( gl )- 
dimensional vector gi (xijfc), and € is a /-dimensional vector of 
residual variables with a zero mean vector and covariance matrix 

&lked- 

For observed categorical variables jik, a normality assumption 
for e uked is equivalent to a probit regression for y^ on risked and 
Xijfc. Alternatively, for dichotomous variables jik, « liked can have 
a logistic distribution, resulting in a logistic regression. For count 
variables y^, the residual euked is assumed to be zero. For nor- 
mally distributed continuous variables y*, the residual variable 
f liked is assumed to be normally distributed. 

3.2.1.2. Example. Suppose that in the above-mentioned math 
test example, data for two additional constructs (attitude toward 
reading and the teaching strategies experienced by the student) 
were collected with three items for each construct. The measure- 
ment model [cp. Equation (3)] is illustrated in Figure 2, and 
accordingly, it assumes two latent factors f)nte (attitude toward 
reading) and tjuikc (experienced teaching strategies). For didac- 
tical purposes, all schools here belong to one class D = 1, so 
that the index d can be omitted, and there is no between-level 
model. Furthermore, heterogeneity is assumed on the within- 
level such that each pupil i belongs to an unobserved class (mix- 
ture) Qk = c. The example measurement model derived from the 
framework above is a confirmatory factor mixture model that is 
given by yik\c, k =e = v ikc + MkcVite + e nke- The heterogeneity, 
which is implied by the mixture c, can be accounted for differ- 
ently by the (statistical) model depending on the hypothesized 
population model: First, a non-normal distribution of the latent 
variables can be modeled as a mixture distribution. For exam- 
ple, attitude toward reading might not be normally distributed. 
A mixture distribution of tjnikc (with varying expectations and 



covariance structure for each mixture component c) could rep- 
resent the non-normality (see Kelava et al., 2014). Second, the 
measurement model might be completely different for each 
unobserved subgroup (with varying factor loadings etc.). For 
example, some pupils might have poor reading skills, and hence, 
do not understand the items well enough. As a consequence, 
factor loadings in this subgroup may be lower (or residual vari- 
ances may be larger) compared with other subgroups, and such 
differences may lead in turn to an observed heterogeneity. 

3.2.2. Structural model 

The structural model for the latent variable vector risked ls gi ven 
for each subject i in cluster k by 

t llik\c ik = c,D k = d = "ked + RlkcdFl(tllikcd) + r Ikcdd (Xlft) + KlikciA 

where ttjfoj is an m-dimensional vector of intercepts, ~Q\k c d is an 
m x mjpj) loading matrix. F\( ■ ) is a smooth polynomial func- 
tion mapping the m-dimensional vector of latent variables risked 
to an fM(pj) -dimensional vector Flunked)- T\k c d is an m x Q(d) 
matrix with regression coefficients. G\ ( • ) is a smooth polynomial 
function mapping the Q-dimensional vector of covariates xi,-jt to 
a Q(Gi) -dimensional vector Gi(xi;/t). Note that for identification 
purposes, vector G\ (x^) has to be completely different from vec- 
tor gi(xijjfc). £ nk c d is an m-dimensional vector of residual variables 
with zero mean vector and covariance matrix ^lked- 

3.2.3. Mixture part 

The model for the latent categorical variable C,k is a multinomial 
logit model 

MQ t = ^ = ^ 1= x lft )= ^W (5) 

where a\ked and h\k c d are regression coefficients, and h\( ■ ) is 
again a smooth (e.g., polynomial) function. 

3.2.3.1. Example. In the following illustrative example, the math 
skills of pupil i from school k ( ?7 1 3 ifc c ) are predicted by the atti- 
tude toward reading (tinte) an d by experienced teaching abilities 
(riuike', see also the example above). All three constructs are mod- 
eled as latent variables, which are measured with three indicator 
variables each. In addition, we assume that math skills can be 
predicted by gender, which is introduced into the model as an 
observed covariate (x\\ik)- F° r simplicity, the model is restricted 
to the within-level. Furthermore, it is assumed that there is unob- 
served heterogeneity due to a latent class C,k- Membership in one 
of the latent classes is predicted by a second observed covariate 
Xnik (e.g., additional private math lessons). In contrast to an ordi- 
nary linear approximation of the relationship between the latent 
variables, the unknown and potentially curvilinear relationship 
is approximated by a latent spline model. Figure 3 illustrates the 
proposed model; the semiparametric spline model is indicated by 
the snake-type arrow. 

3.3. LEVEL 2 -BETWEEN (CLUSTER) LEVEL 

The multilevel (between) part of the model is conceptual- 
ized as follows. Each of the intercepts (v\kcd, aked, diked) and 




FIGURE 2 I A measurement model for subject 1 for two latent variables 
with a mixture distribution on the within-level (the between-level ith 
not included in this example). The mixture distribution is symbolized by 
the frame with dashed lines. It was assumed that all subjects belonged to 
one latent class D = 1 on the between-level so that the index d could be 
omitted. 
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slopes or loading parameters (Ai kcd ,Ki kcd ,Bi kcd ,Ti kcd ,bi k cd) 
in Equations (3), (4), and (5) can be either a fixed coef- 
ficient or a random effect that varies across the observed 
clusters k. 

3.3.1. Structural model 

Let r) 2kd be the [/-dimensional vector of all such random 
effect variables and any additional between-level latent exoge- 
nous variables that explain these random effects and vary 
across the clusters. Note that rf 2kd is different from r) ukcd which 
is the individual-level latent variable vector. For a given cluster k, 
the between-level structural model for r\ lkd is defined as 

i)2k\D k = d = ti d + VidFiiVikd) + T 2d G 2 (x 2k ) + Sikd ( 6 ) 

where /i d is a [/-dimensional vector of intercepts, and B 2d is a 
17 x U(p 2 ) loading matrix. F 2 ( ■ ) is a smooth polynomial func- 
tion mapping the [/-dimensional vector of variables n\ 2kd to a 
U(f 2 ) -dimensional vector F 2 (ri 2kd ). T 2d is a U x V{g 2 ) matrix 
with regression coefficients. x 2k is a V-dimensional vector of 
all observed unexplained between-level covariates that may have 
an additional influence on the variables in vector r) 2kd . Note 
that x 2k is different from xu k . G 2 ( ■ ) is a smooth polynomial 
function mapping the V-dimensional vector of between-level 
covariates x 2k to a V(g 2 ) -dimensional vector G2(x 2 jt). t, 2kd is a 
17-dimensional vector of residual variables with a zero mean 
vector and covariance matrix ^> 2d . fi d , B 2d , and T 2d are fixed 
parameters. 

3.3.1.1. Example. Suppose that the model in Figure 3 is extended 
to allow for multilevel effects on the between-level (Level 2). In 
Figure 4 depicts a latent random intercept model that implies a 
school-specific intercept (a3 kd ) for school k when the math skills 
(inikd) °f a given pupil i are examined. In order to approximate a 




FIGURE 3 | Structural model for subject / in latent class C ik with a 
nonlinear spline relationship between the latent variables (indicated 
by the snake-type arrow). Note that this figure shows only a single-level 
model; the index d is therefore omitted. 



potentially non-normal distribution of the school-specific inter- 
cepts or to reveal a certain heterogeneity in the latent intercepts 
(i.e., average math skills), a latent mixture model with the latent 
categorical variable D k is applied. This mixture reflects Level- 
2 heterogeneity that may stem from (unobserved) sources, for 
example, certain school characteristics that influence the average 
math skills in school k. 

3.3.2. Measurement model 

Let z* k be the L-dimensional vector for cluster k that includes 
scores on all observed between-level variables that are indicators 
of the latent variables in vector i\ 2kd . For a given cluster k, the 
measurement model is defined by 

z k \D k = d = v 2 d + A 2d f 2 (r) 2kd ) + K 2d g 2 (x 2k ) + e 2kd (7) 

where v 2d is an L-dimensional vector of intercepts, A 2d is an 
L x U(j 2 ) loading matrix. f 2 ( ■ ) is a smooth polynomial func- 
tion mapping the [/-dimensional vector of variables r) 2kd to a 
U(j 2 ) -dimensional vector f 2 (i) 2kd ). K 2 d is an L x V(g 2 ) matrix 
with regression coefficients. x 2k is the V-dimensional vector of 
all observed unexplained between-level covariates that may have 
an additional influence on the indicator variables zjj;. g 2 ( ■ ) is a 
smooth polynomial function mapping the V-dimensional vec- 
tor of between-level covariates x 2k to a V( g2 ) -dimensional vector 
g 2 (x 2k ). Note that for identification purposes g 2 (x 2k ) has to be 
completely different from Gatei). e 2k d is a L-dimensional vec- 
tor of residual (mixture) variables with a zero mean vector and 
covariance matrix & 2 d- v 2 d, A 2 d, and K 2 d are fixed parameters. 




subject i 




OL-ikd ) ! D 



cluster k 



FIGURE 4 | Structural model for subject / in cluster k with a nonlinear 
spline relationship between the latent variables on the within-level 
(indicated by the snake-type arrow) and a random intercept (0,3^) that 
is modeled as a mixture of normal distributions on the between-level. 
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3.3.3. Mixture part 

The model for the between-level categorical variable D k is also a 
multinomial logit regression 



Pr(D k = d\x 2 = x 2k ) 



exp(a 2 rf+b^h 2 (x2t)) 
^ t exp(« 2f +b^ t /i 2 (x 2 it)) 



(8) 



where aid and hid are regression coefficients, and hz(-)is again a 
smooth (e.g., polynomial) function. 

3.3.3.1. Example. In this last example (see Figure 5, the ran- 
dom intercept model in Figure 4 has been expanded by adding 
two latent Level-2 predictor variables (r?2ifa/ and r\n k d) that may 
influence the average math-skill level, for example, structural 
problems and social problems in school. Besides the linear effects 
of the latent predictors, there is an interaction effect that models 
the hypothesis that high scores on both between-level predictors 






Vlik 






V&ik 





subject i 



X2lk 



I) 




cluster k 



FIGURE 5 | Structural model for subject i in cluster k with a spline 
relationship between the latent variables on the within-level 
(indicated by the snake-type arrow), and a random intercept {u 3kd ) 
that is predicted by an interaction model on the between-level. The 

distribution of the between-level's predictors is approximated by a mixture 
of normal distributions. The latent categorical variable D k is predicted by a 
between-level covariate X2iit- 



may lead to a particularly low (or high) average math-skill level. 
A potential heterogeneity of the latent predictors (e.g., a non- 
normal distribution) is taken into account by introducing a latent 
categorical variable D k . In addition, a manifest predictor variable 
X2ih for example, school size or school type (private or public), is 
included in the model to predict the latent class probability of D k 
as described more generally in Equation (8). 

3.4. SUMMARY 

In the model described by Equations (3) to (8), the latent vari- 
ables on Level 1 e U kcd, and ? likcd ) and on Level 2 [y\ lkA , 
€2kd> and %2kd) are conceptualized as variables stemming from 
mixtures on level 1 and level 2, respectively. The possibility of 
specifying within- and between-level mixture components is a 
result of introducing the latent categorical variables Q k and D k on 
the individual and cluster levels, respectively. On the within-level, 
unobserved latent classes may refer to different subpopulations 
(within each cluster), for example, pupils with different socioe- 
conomic backgrounds in a given school. On the between-level, 
latent mixtures additionally allow for a specification of hetero- 
geneity across/between observed clusters, for example, hetero- 
geneity that is caused by certain characteristics of the schools. 
Furthermore, due to the conceptualization of mixture variables, 
a semiparametric modeling of non-normally distributed latent 
variables is available (e.g., Yang and Dunson, 2010; Kelava and 
Nagengast, 2012; Kelava et al., 2014), or a simple semiparametric 
formulation of the latent relationships (e.g., Bauer, 2005) is possi- 
ble. Finally, the implementation of general polynomial functions 
Fli ■ )>/i( • )> Gi( • ), and g\ ( • ) allows for a flexible inclusion of 
different parametric or semiparametric relationships (e.g., inter- 
action effects or splines; Hastie et al, 2009), which extends the 
opportunities to model non-linear effects (e.g., Guo et al., 2012; 
Song etal, 2013). 

4. MODEL IDENTIFICATION 

As in any other latent variable framework, within the GNM- 
SEMM framework, the user must ensure that the specified model 
is identified. In this section, we will summarize important aspects 
that need to be considered even though model identification 
is not straightforward (cf. San Martin et al., 2011; Song et al., 
2013). For the identification of the proposed model, four separate 
aspects need to be taken into account. However, the actual identi- 
fication of a specific model needs to be examined individually. 

First, within each mixture component standard assumptions 
for non-linear structural equation models need to be met. This 
mainly implies that restrictions be placed on manifest scaling 
variables or latent exogenous variables (e.g., a necessary condi- 
tion for the identification is to set one factor loading for each 
latent predictor variable or the latent predictors' variance to one). 
In addition, either the latent intercepts of the indicator variables 
or the latent intercepts of the latent variables may be estimated 
in a model. Note that when latent exogenous variables (e.g., 
Winked, Winked) are identified, their latent product terms (e.g., 
InikedWniked) do not need product indicators for identification 
(cf. Klein and Moosbrugger, 2000). 

Second, regarding the inclusion of polynomial functions for 
the observed covariates, it is necessary that the vectors gi(xu k ) 
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and Gi(xi;jfc) on Level 1 and, respectively, the vectors gii^ik) and 
Gafejfc) on Level 2 are completely different from each other. For 
example, a model including gi(xuk) = Gi(x llk ) = (x nik , x 2 uik )' 
would not be identified because Xufc would be a predictor in the 
measurement and structural models [see Equations (3) and (4)]. 
In this case, two effects of xmk would be estimated simultane- 
ously on the right side of one regression equation, which would 
not be identified. The same holds for the polynomial functions 
of the latent variables. Again, f\{rjn kcd ) an d PiiVukcd) on Level 
1 as well asf 2 (ri 2kd ) an d ^li^likd) on Level 2 have to be unequal 
[see Equations (7) and (6)] 2 . Otherwise, perfect collinearity would 
be the result, meaning that the covariates and latent variables, 
respectively, would have the same influence on the measure- 
ment and the structural models. Their impacts would not be 
separable. Furthermore, polynomial (semiparametric) functions 
should not include constants. Otherwise, latent intercepts in the 
measurement and structural models would not be identified. 

Third, on the between (cluster) level the inclusion of latent 
exogenous variables, which explain the variability in the ran- 
dom coefficients, requires measurement models (see Figure 5). 
The exogenous latent variables at Level 2 need to be identified 
as well according to identification rules, which are the same as in 
single-level structural equation models. 

Fourth, additional assumptions concerning the latent classes 
of the mixture components are required. For the identification of 
the discrete latent variables, (a) the unconditional probabilities in 
Equations (5) and (8) need to sum up to one. and (b), the ambi- 
guity of mixture components due to the so-called label switching 
problem makes it necessary to impose additional (artificial) con- 
straints or relabeling strategies e.g., restrictions on the mean 
structure or ordinality of mixture proportions (see Equations 
15-19; Redner and Walker, 1984; Stephens, 2000; Kelava and 
Nagengast, 2012). 

Note that the identification of separate parts of a model (e.g., 
the measurement model and the structural model) does not auto- 
matically imply that the whole model is identified. General nec- 
essary and sufficient conditions to guarantee the identifiability of 
a latent variable model are difficult to establish. Hence, we focus 
primarily on the necessary identification conditions in this article. 

5. MODEL ESTIMATION 

Generally speaking, latent variable modeling offers a large 
variety of methods for the estimation of specified models. The 
choice of the best estimation method strongly depends on the 
distributional assumptions of the observed and latent variables, 
the given sample size, the type of specified model, potential 
confounders, and many more aspects. Just to mention a few 
large classes, these methods comprise maximum likelihood esti- 
mators (e.g., Joreskog, 1973; Rabe-Hesketh et al., 2005; Muthen 
and Asparouhov, 2009), least squares methods (e.g., Joreskog 
and Goldberger, 1972; Browne, 1974, 1984), and methods of 
moments (e.g., Bentler, 1983), among others. For example, when 
applying a maximum likelihood estimator, in the well-known EM 
algorithm (Dempster et al., 1977), which treats latent variables as 



2 An exception is the special case in which the coefficient matrix B = 0: that is, 
for confirmatory factor models. 



missing data, the likelihood L of the observed indicator vector y 
is given as: 

i=Y[Y, Pr{Dk = d) f *2kd(v2 kd )Y[ 
k d J i 

J^PriQk = c) j fukcd(YikHukcd(Vi,kcd) dr li,kcd^ dr l2kd 

(9) 

where f Vlkcd { • ), i/ f iifed( ■ ), and \// 2kd ( • ) are probability density 
functions for the observed variables y, and the latent variables 
Viikcd and >/2iW> respectively (cf. Muthen and Asparouhov, 2009). 
Because the likelihood function L of the observed indicator vector 
jik is not given in closed form in general, numerical integra- 
tion can be utilized in the evaluation of the likelihood using 
both adaptive and non-adaptive quadrature. As an alternative, 
the likelihood could be directly optimized by applying a quasi- 
Newton algorithm. Both approaches of estimating parameters are 
very complex due to the non-linearity (for a discussion of latent 
interaction effects, see Klein and Moosbrugger, 2000). 

However, in recent years, the Bayesian framework has become 
very popular in latent variable modeling (e.g., Lee et al., 2004; Lee, 
2007; Lee et al, 2007; Song et al., 2009). The main reason is that 
it provides flexible options for specifying and estimating mod- 
els. Bayesian estimation methods and algorithms (e.g., Markov 
Chain Monte Carlo: MCMC) can handle numerous complex 
parametric, semiparametric, and non-parametric relationships 
and distributions, for example, latent mixture distributions (e.g., 
Yang and Dunson, 2010; Kelava and Nagengast, 2012), non-linear 
models (e.g., Lee et al, 2007; Guo et al, 2012; Song et al, 2013), 
and multilevel structures (e.g., Fox and Glas, 2001; Song and Lee, 
2004). Referring to the proposed GNM-SEMM framework with 
its semiparametric functional forms and its capability of con- 
sidering non-normally distributed variables, a Bayesian approach 
seems to be a viable way to estimate models. In this sense, we will 
provide general descriptions of the specifications of the variables' 
distributions and the selection of prior distributions. 

Parameter vectors are defined as follows: For the Level- 1 
parameters, let 8 Mlkcd = (v' lkcd , vec(A lkcd Y, vec(K lkcd Y, 
vec(®\ kcd yy for the measurement model, where vec( ■ ) vector- 
izes all elements of a given matrix. For the structural model, let 
Osikcd = ( a ' kcd > vec(Blfcrf)'. vec(T lkcd y, vec(^ lkcd YY, and for the 
mixture model part let Q m \ kcd = (a\ kcd , fajy)'. Analogously, for 
the Level-2 parameters, let Quid = ( v ' 2d > vec (^-2dY> vec{K 2d )' '» 
vec(&2dYY for the measurement model. For the structural 
model, let 8 S2d = [/i' d , vec(B 2d Y, vec(T 2d Y, vec[9 2d )'Y, and 
for the mixture model part let 9 m 2 d = (a 2d , ^>' 2d )' ■ Finally, let 
f9jvri, &s\, 6 m i, #M2> #S2> and 6 m2 be the vectors that include 
all parameters from the defined model parts across all latent 
classes c = 1, . . . , CI, d = 1, . . . , D*, and observed clusters 
k=l,...,K. 

5.1. SPECIFICATION OF THE VARIABLES' DISTRIBUTION 
5.1.1. Level 1 

For the Bayesian analysis, the j = 1 , . . . , / indicator variables on 
Level 1 are specified as a cluster-specific mixture distribution. The 
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single mixture is given as 

Y*k\e M l,Osi^iik.Cik = <:,D k = d ~ N(fl y (0Mlkcd, Qsikcd, Xjjjfc), ®i k \ d )(W) 

where fi y (0Mikcd, @sikcd, xiut) is the vector of conditional expec- 
tations of y* k , which are specified in Equation (3) and depend 
on the parameter vectors &Mikcd an d 9s\kcd> and on the covari- 
ate vector Xijjfc. ®^lcd ls ^ e P rec ision matrix of the multivariate 
normal distribution of the measurement error variables (i.e., the 
inverse of the covariance matrix). The model implies a specific 
mean vector and covariance matrix for subjects stemming from a 
certain latent class c on Level 1 that is clustered in a latent class 
d on Level 2, which in turn is given for an observed cluster k. 
Within each cluster k, y* k is a mixture of D* components, which 
model heterogenity in the observed clusters. Further, within in 
each mixture component d, y* k is a mixture of C d components, 
which induce heterogenity on the individual level (C5 may change 
across different latent classes on Level 2). 

The latent variables i/ijjfcai on Level 1 are specified as 

rii 1 k\e sl ,x 1 , k ,c,k = c,D k = d ~ N(fi' n (e sikcd , x lik ), ¥7^) (11) 

with the vector ft' 11 (Qsikcd, xii/t) of conditional expectations for 
^ liked that depend on the parameter vector Qs\kcd an d covariate 
vector xuk as specified in Equation (4) as well as in the precision 
matrix 

5.7.2. Level 2 

Analogous to the specification of the variables' distributions on 
Level 1, the indicator vector z* k is modeled as 



Z k>8M2,&S2,X2k,Dk = d ' 



N(fl Z (OM2d,0S2d,X2k),&2d ) ( 12 ) 



with the vector fi z (^Midi #S2d> x 2d) of conditional expectations 
for zf as specified in Equation (7) and precision matrix &T, ;. 
The unconditional indicator vector z k is composed of D* mixture 
components. Finally, the distribution of the latent variable vector 
V2kd> is g iven as 



*l2k\e S2 ,x 2k ,D k = d ~ N(ll1*{0S2d, x 2fc) 



(13) 



with the vector of conditional expectations Ii r,2 (@s2d> x 2k) speci- 
fied in Equation (6) and precision matrix ¥7/, . 

5.2. SPECIFICATION OF PRIOR DISTRIBUTIONS 

For the prior specification, informative or non-informative priors 
can be selected (Gelman et al., 2004). This selection is primarily 
based on the availability of prior knowledge. Because the applica- 
tion of non-informative priors may lead to suboptimal solutions 
(e.g., Lee et al., 2007), it may be necessary to analyze parts of 
the model (e.g., a confirmatory factor analysis for the Level-2 
predictors) to obtain information about the parameters. Here, a 
very general description of the proposed model is provided. For a 
detailed description of priors see Gelman et al. (2004). 

The class probabilities Pr (Cat = c\Dk = d, ~x.uk) a nd Pr{D k = 
d\x 2 k) depend on the multinomial logit models given in 
Equations (5) and (8) and thus depend on the parameters in 
6 m i and 0 m 2- For these parameters, uninformative priors are 



suggested unless information about heterogeneity is available (see 
also Kelava and Nagengast, 2012). 

For each precision matrix of the mixture distributions defined 



above, that is for ® 2 rf f° r me indicator variables, and for 
*iibd> f° r me l a tent variables, a multivariate normal distri- 
bution is assumed within each component. Conjugate priors are 
then given for c = 1, C d , d = 1, . . . , D* as 

"ifcrf VV(KJ oikcd'P > 



0 



2d 



'02d 
.-I 



Xkcd 



2d 



(14) 



The hyperparameters p and the (positive definite) matrices 
®0iitcd> ®02d> ^axkcdi an d *o2d of the Wishart distribution 
include parameter information that may stem from previous 
studies or knowledge about the parameters. For example, *02d 
includes information about the variances and covariances of the 
random coefficients, and about the latent endogenous and exoge- 
nous variables on Level 2. This information may refer to estimates 
of the (co)variances for the latent exogenous variables retrieved 
from a separately estimated confirmatory factor analysis. 

The conjugate priors can be modified, for example, if the 
residual covariance matrix &2d on Level 2 is assumed to be 
diagonal, then each diagonal element ( j = 1 , . . . , /) can be 

assumed to be inverse Gamma distributed, that is (O^) 1 ~ 
Gamma(oti , fij ) (with hyperparameters a, fi) (Kelava and 

®2d 2d 

Nagengast, 2012). Further information about the selection of 
priors for count or ordinal data can be found in Song et al. (2013). 

For the other parameters in 8mi, &si> #M2> a nd 9s2, nor- 
mally distributed priors are used within each mixture component. 
Though, the definition of some priors needs to be formulated 
recursively (cf. Kelava and Nagengast, 2012). For example, let 
^\kcd De the ^'-th element of the vector v\k c d (which specifies the 
intercept of the j-th variable in y* k \ Cjk = CtDk = d ), and let <S/ lkcd 
be the j-th diagonal element in the matrix &ik c d- Then for the 
latent classes c = 1, d = 1, the conjugate (normal) prior for vL.j 
is specified as 



4iil® ? mi~^omi.0 ? mi H o) 



(15) 



with hyperparameters Ho and that include information 

about vLjp For all other latent classes, that is c > 1 or d > 1, the 
following prior is selected: 

v'lkld^lkld = V iicl(d-l)l®lA:l(d-l) + A lil(d-l)l®ljfclrf 

ifc= 1, d > 1 (16) 

"ifccl I ®lJfccl = ^lfcCc— 1)1 1 ®ifc(c— 1)1 + ^li(c-l)ll®lfo;l 

ifo 1, d= 1 (17) 



Mi 



+ A 



'ikcd'^lkcd — 1fc(c-l)(d-l)l ,J llk(c-lXd-l) ^ "lfcCc-l)(d-l)l w l]W 

else, (18) 
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with 

If parameters are constrained to be the same across mixture com- 
ponents (e.g., vikcd = Vik and &ikcd = Bijfe), Equations (15) to 
(19) simplify to 

4<~^Kl*>©i* H o)' (20) 

For the other parameter matrices, that is for Aifad, Kiked> 
Ukcd,Bikcd,T lkcd and so forth on Level 1 and v 2( j, A 2 d, 
K2d, /tj, B 2 d, T 2 d and so forth on Level 2, a specification corre- 
sponding to the formulation above given is straightforward when 
the appropriate precision matrices are used. In order to avoid the 
label-switching problem in a mixture distribution, only one of the 
parameter matrices needs to be formulated recursively. 

6. EMPIRICAL EXAMPLE 

In this section, we will provide an extensive illustration of 
the GNM-SEMM with an example that is based on data from 
the Program for International Student Assessment 2009 (PISA; 
Organisation for Economic Co-Operation and Development, 
2010), which is publicly available under http://pisa2009.acer.edu. 
au/downloads.php. The sample was a German subsample ofN = 
1 , 474 pupils from 226 schools who took a math test. Additional 
covariate information were available on the individual level as 
well as on the school level. 

As before, we predicted pupil's math skills (Math) with their 
general attitude toward reading (Att) and the teaching strategies 
they experienced (Strat). We further expected that pupil's average 
math skills (latent intercept of Math) would vary systematically 
across schools 3 , and that this variation could be (partly) accounted 
for by Level-2 predictors with measurement errors, here, struc- 
tural problems in school (Prob) and the schools's social environment 
(Soc). 

We will report the results for a model that accounted 
for different aspects of the general model. The example is 
not exhaustive with regard to all potential parameters within 
the GNM-SEMM framework, but it provides an indication 
of the flexibility of the proposed framework in accommodat- 
ing different aspects of the data: A spline model on Level 1 
described a semiparametric flexible relationship between Att, 
Strat, and Math. A random intercept for Math was explained 
by the Level-2 predictors Prob and Soc, and the interaction 
effect between them. Furthermore, a mixture model accounted 
for the non-normality of the latent predictors on Level 2 
(heterogeneity). 

6.1. MODEL FORMULATION 

In the following, we will provide the specified measurement and 
structural equations for the model. For reasons of clarity, we 
restricted the subscripts (k, c or d) in the model formulation 
to those model parameters that actually depended on the latent 



3 The ICC was 0.407 for the manifest variable, which was the sum of all Math 
items. 



classes or the Level-2 model. Figure 6 presents a diagram of the 
model and its parameters. 

6. 1. 1. Structural models 

The Level-1 structural model [cf. Equation (4)] for the i-th pupil 
in school k was given by 

yiik = <*k + BiFi(riiik) + $Hk 

©-G)*Q«=a*S 

(21) 

where Fu and F22 both defined a latent cubic spline model with 
two knots at §1 = 2, ^2 = 3 that approximated the (curvilinear) 




pupil i 




FIGURE 6 I Structural models and measurement models on the 
within-level (Level 1) and between-level (Level 2). On Level 1, the math 
skill (Math) of a pupil / is predicted by his/her general attitude toward 
reading (Att) and his/her experienced teaching strategies (Strat). The 
snake-type arrows indicate a flexible spline approximation of the latent 
variable relationship. On Level 2, the average math skills of pupils (latent 
intercept a^k) in school k are explained by a nonlinear interaction between 
structural problems in the school (Prob) and the school's social environment 
(Soc). The non-normality of the latent predictors is approximated by a 
mixture distribution. 
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relationships between the variables (e.g., Hastie et al, 2009): 
jSjFiiCAttiiO = friAtt;* + PuAttl + p u Att 3 lk 

+ Pu(AUik - + frsfAtt,* - 

P 2 F n (Stnt ik ) = ftiStrati* + feStrat^ + feStrat^ 
+ hi (Strata - + fes (Strata - 

(22) 

Only the latent intercept was assumed to vary across schools. 
The Level-2 structural model [cf. Equation (6)] for school k was 
given by 

Vik\D k = d = Hd + B 2 -F2(i?2foi) + izk 

Prob t | Dl = d\ /A*w\ / 0 0 0\ 
Socjtb^d = U 2d + 0 0 0 

I Probju \ /?2iA 

Socjtd + fe* (23) 

\Probfaj • SoCfaj/ \?23/:/ 

with ?2*rf = (Probtd. Socjtd, a 3 ;t)' and FziVikd) = ( Prob w. Soc M , 
Probed • SoQd)'. The product term Probed • Soc^d implemented 
the interaction effect of the structural problems in school and the 
social environment. Because the non-normal distributions of the 
latent predictors were approximated by a mixture distribution, 
their expectations Hid and /x 2 d were assumed to vary across the 
unobserved mixtures (Kelava and Nagengast, 2012). 

6.1.2. Measurement models 

For each of the latent variables between nine and 13 items 
were available; they were aggregated to three indicator variables 
for each latent variable (item parcels) for didactic purposes. 
It was assumed that the indicator variables were continuously 
distributed, resulting in an identity link function in the measure- 
ment model (y* k = and z* k = z*, respectively). 

On Level 1, the measurement model for pupil i in the /c-th 
school [cf. Equation (3)] was given by 

YHc= Vi + Ai/i(lj lft ) + 6ijjt (24) 

where/i(jj lifc ) = (Art,*, Strata, Math*)'. 

On Level 2, the measurement model [cf. Equation (7)] was 
given by 

Zk\D k = d = v 2 + Mf2(ri2kd) + e 2k (25) 

where fx(V2kd) = (Prob/trf, Soqy)'. The factor loading matrices 
Ai and A 2 were formulated with a simple structure (i.e., each 
item loaded on only one latent variable). The residual vari- 
ables €\ik and €2ik were assumed to be mutually uncorrelated 
and normally distributed with zero mean vectors and (diagonal) 
covariance matrices ©i and @2> respectively. 



6. 1.3. Parameter constraints and identification 

Besides employing the standard identification constraints for 
structural equation models, we restricted the measurement model 
parameters and the structural model parameters to be the same 
across schools except for the latent intercept ot^. Due to the 
invariance of the measurement models for the latent predictors on 
Levels 1 and 2, in Equations (24) and (25) the non-linear effects 
in the polynomial spline model and the interaction effect in 
Equations (22) and (23) were identified. For the mixture model, 
we fit two latent classes (Djt = 1,2). 

6.2. MODEL ESTIMATION 

To keep this example as simple as possible, missing data were 
assumed to be missing at random, and this was accounted for 
directly in the analysis by applying the Gibbs sampler (Gelman 
et al., 2004). The analysis of the latent multilevel model was 
implemented by using the R-project software (R Core Team, 
2013) and the OpenBugs package (Lunn et al., 2009). Syntax 
for the empirical example can be obtained upon request from the 
authors. 

6.2. 1. Starting values and prior selection 

Starting values for the measurement model parameters were 
based on the prior analyses conducted in Mplus Muthen and 
Muthen (1998-2010) for separate parts of the model. Informative 
priors were then selected in accordance with recommendations by 
Gelman et al. (2004) as well as Kelava and Nagengast (2012). 

6.2.2. Bayesian analysis 

For the analysis, three chains with 100,000 iterations each 
were generated. The first 75,000 iterations (burn in) were 
then discarded. As proposed by Gelman (1996), convergence 
of the estimation procedure was achieved when all (EPSR 
Estimated Potential Scale Reduction; Gelman, 1996) values 
were below 1.2, which occurred after about 60,000 iterations 
(see the Supplementary Material, Figure SI). Trace plots also 
indicated good convergence (see the Supplementary Material, 
Figure S2). Means, standard errors, t-values, and percentiles 
of the posterior distributions of the parameter estimates 
based on the last 25,000 iterations are reported in the next 
subsection. 

6.3. RESULTS 

We will summarize the main results in this subsection. Detailed 
results for the estimated multilevel model are presented in 
Table 1. In the measurement models, the factor loadings were all 
significant and positive, thus indicating that the latent constructs 
were measured reliably. 

The results for the semiparametric approximation of the true 
relationships between the Level- 1 latent variables Att, Strat, and 
Math are illustrated in Figure 7. The relationship between Math 
and Att resembled a cubic relationship; the subjects' Math scores 
slowly increased with increasing Att scores, whereby a stronger 
increase was found for Att scores greater than 3 and a slight 
decrease for Att scores greater than 4. The relationship between 
Strat and Math seemed to be slightly quadratic with the highest 
Math scores for medium Strat scores. 
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Table 1 | Mean parameter estimates, standard errors, f-values, and 
2.5, 50.0, and 97.5% percentiles. 



9 


SE 


f -value 




Percentiles 










2.5% 


50.0% 


97.5% 






Intercepts 


U121 -1.078 


0.076 


-14.190 


-1.229 


-1.076 


-0.933 


y 131 -0.409 


0.072 


-5.709 


-0.557 


-0.408 


-0.275 


1-152 0-411 


0.118 


3.484 


0.173 


0.414 


0.633 


v m2 -0.419 


0.175 


-2.399 


-0.769 


-0.411 


-0.089 


vies 0.058 


0.018 


3.317 


0.023 


0.058 


0.092 


1-193 0-340 


0.016 


21.069 


0.308 


0.340 


0.372 


Factor loadings 


A.121 1.141 


0.026 


43.986 


1.091 


1.141 


1.192 


A.131 0.997 


0.024 


40.814 


0.951 


0.996 


1.047 


A n52 0.687 


0.043 


15.864 


0.605 


0.686 


0.774 


J. 16 2 1.213 


0.064 


18.849 


1.091 


1.210 


1.343 


^183 0-7 54 


0.030 


25.279 


0.696 


0.754 


0.814 


^193 0.553 


0.027 


20.437 


0.501 


0.553 


0.607 


Path coefficients 


0.005 


0.161 


0.031 


-0.312 


0.004 


0.320 


p12 0-009 


0.117 


0.079 


-0.221 


0.009 


0.237 


/5 13 -0.005 


0.029 


-0.174 


-0.061 


-0.005 


0.052 


0.046 


0.068 


0.678 


-0.088 


0.046 


0.180 


(815 -0-164 


0.104 


-1.580 


-0.367 


-0.165 


0.037 


ftn -0.070 


0.192 


-0.366 


-0.453 


-0.060 


0.287 


P22 0-079 


0.080 


0.992 


-0.073 


0.077 


0.239 


fe -0.017 


0.012 


-1.449 


-0.040 


-0.017 


0.006 


,624 0.007 


0.016 


0.443 


-0.025 


0.007 


0.039 


P25 0.018 


0.029 


0.620 


-0.040 


0.018 


0.073 


Means/intercepts of latent variables 


a! 2.856 


0.022 


129.972 


2.813 


2.856 


2.899 


a 2 2.700 


0.019 


143.505 


2.663 


2.700 


2.737 


(Co)variances of latent variables 


f m 0.506 


0.025 


20.357 


0.459 


0.506 


0.556 


Vf 12 i 0.072 


0.012 


5.866 


0.048 


0.072 


0.097 


fi22 0-250 


0.019 


13.076 


0.215 


0.250 


0.291 


^133 0.041 


0.003 


15.847 


0.036 


0.041 


0.046 


Residual variances 


6,11 0.147 


0.009 


16.124 


0.129 


0.147 


0.165 


0,22 0-198 


0.012 


16.607 


0.176 


0.198 


0.222 


6,33 0.212 


0.011 


19.288 


0.191 


0.212 


0.234 


6,44 0.21 2 


0.014 


14.708 


0.184 


0.213 


0.241 


0, 55 0.323 


0.014 


22.999 


0.297 


0.323 


0.352 


e, 66 0.21 9 


0.019 


11.251 


0.181 


0.219 


0.257 


6177 0.066 


0.003 


19.760 


0.059 


0.066 


0.072 


0188 0.047 


0.002 


20.197 


0.042 


0.047 


0.052 


6,99 0.049 


0.002 


23.364 


0.045 


0.049 


0.053 



(Continued) 



Table 1 | Continued 



9 SE t-value Percentiles 



2.5% 50.0% 97.5% 



LEVEL-2 PARAMETERS 
Latent class probabilities 



P(D=1) 0.532 


0.255 


2.082 


0.069 


0.537 


0.955 


P (D = 2) 0.468 


0.255 


1.835 


0.045 


0.463 


0.931 


Intercepts 


V221 0.759 


0.415 


1.829 


-0.063 


0.767 


1.550 


U231 0.603 


0.277 


2.180 


0.048 


0.603 


1.143 


1)252 - 0.024 


0.184 


-0.129 


-0.391 


-0.020 


0.331 


1-262 0-279 


0.179 


1.556 


-0.077 


0.281 


0.614 


Factor loadings 


A221 1-029 


0.206 


4.999 


0.635 


1.027 


1.439 


X231 0.700 


0.137 


5.113 


0.434 


0.699 


0.970 


*252 1 -0 02 


0.091 


11.071 


0.828 


1.001 


1.180 


^262 0.794 


0.088 


8.992 


0.629 


0.793 


0.969 


Path coefficients 


P3 0.558 


0.101 


5.512 


0.381 


0.550 


0.776 


p A 0.442 


0.108 


4.072 


0.261 


0.432 


0.680 


fi 5 -0.289 


0.053 


-5.483 


-0.405 


-0.285 


-0.199 


Means/intercepts of latent variables 


M11 1.921 


0.099 


19.408 


1.685 


1.935 


2.076 


/i 21 1.938 


0.081 


24.063 


1.739 


1.950 


2.062 


HU 2.107 


0.123 


17.088 


1.931 


2.084 


2.424 


1122 2.091 


0.104 


20.076 


1.955 


2.071 


2.367 


112 -0.365 


0.162 


-2.248 


-0.686 


-0.361 


-0.059 


(Co)variances of latent variables 


f2V 0.291 


0.049 


5.959 


0.207 


0.287 


0.396 


i/f 2 2i 0.007 


0.024 


0.304 


-0.038 


0.007 


0.055 


^222 0.239 


0.030 


7.986 


0.186 


0.237 


0.305 


f 2 33 0-051 


0.005 


11.169 


0.042 


0.050 


0.060 


Residual variances 


02ii 0-415 


0.059 


6.970 


0.305 


0.413 


0.539 


0222 0.723 


0.098 


7.412 


0.543 


0.718 


0.927 


0233 0.366 


0.046 


7.898 


0.280 


0.364 


0.461 


0244 0.183 


0.022 


8.204 


0.143 


0.182 


0.230 


0255 0-130 


0.017 


7.763 


0.100 


0.129 


0.165 


0266 0-176 


0.020 


8.940 


0.141 


0.175 


0.217 



In order to test the hypotheses on the cubic relationship for 
Att and the quadratic relationship for Strat 4 , we estimated a 
model that changed Equation (22) to /3iF(Att;jt) = j8nAttft+ 



A direct inference with regard to a parametric relationships, including a 
linear relationship, based on the parameter estimates for the spline model 
(e.g., Pu) is not straightforward (Cox et al., 1988; Cox and Koh, 1989; 
Zhang and Lin, 2003; Liu and Wang, 2004). In general, an additional model 
that can actually test a parametric hypothesis seems to be a viable procedure 
(Azzalini and Bowman, 1993). 
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FIGURE 7 | Semiparametric Level-1 relationships between pupils' 
math skills (Math) and their general attitude toward reading (Att; 
left), and Math and experienced teaching strategies (Strat; right). 



Table 2 | Mean parameter estimates, standard errors, f-values, and 
2.5, 50.0, and 97.5% percentiles for the parametric model (cubic 
relationship for Att and quadratic relationship for Strat) on Level 1. 

0 SE (-value Percentiles 

2.5% 50.0% 97.5% 



PATH COEFFICIENTS 



011 


-0.045 


0.139 


-0.324 


-0.307 


-0.045 


0.232 


012 


0.005 


0.055 


0.097 


-0.105 


0.006 


0.112 


013 


0.003 


0.007 


0.354 


-0.012 


0.003 


0.017 


021 


0.154 


0.082 


1.877 


0.000 


0.157 


0.329 


022 


-0.034 


0.016 


-2.086 


-0.067 


-0.034 


-0.003 



PnAttl + ft 3 At4 and p 2 F 12 (Strat*) = fciStrat,* + feStrat^. 
Results for the structural parameters on the within-level can 
be found in Table 2. The parametric cubic relationship for Att 
was not significant (/J13 = 0.003, p = 0.745 for the cubic effect 
and = -0.045, p = 0.723 for the linear effect). The attitude 
toward reading did not significantly predict the math ability. 
The parametric model for Strat indicated a significant negative 
quadratic relationship (P22 = — 0.034, p = 0.037). This indicated 
that pupils' math skills were highest for those subjects who rated 
the experienced teaching strategies as average. 

On Level 2, the random intercept factor a^k had a signifi- 
cant negative intercept (/13 = — 0.365, p = 0.024) and an unex- 
plained variance across schools of 1/^233 = 0.051. The linear effects 
of the predictors were significant with fij, = 0.558 (p < 0.001) 
for school problems (Prob) and ^4 = 0.442 (p < 0.001) for 
social problems (Soc). The interaction effect was significant and 



The gray crosses indicate the predicted slope with a predicted 
school-specific random intercept; the black line indicates the predicted 
Math score for the mean random intercept. 



alpha_3 




FIGURE 8 I Between-level: Three-dimensional illustration of the 
relationship between school problems (Prob), social problems (Soc), 
and the random intercept a 3 i, of Math. 



negative with jfj 5 = -0.289 (p < 0.001). Figure 8 illustrates the 
complex non-linear association between Prob, Soc, and the ran- 
dom intercept a^- The expected math level of a school with 
an average score on school and social problems was about 0.5 
(E[a3\Prob = Prob, Soc = Soc] = 0.461); the expected math level 
was higher in schools for which one of the problems was above 
average and the other was below average; and the math level 
decreased rapidly when both problems increased together. 
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FIGURE 9 | Predicted slightly non-normal densities of the Level-2 predictors Prob and Soc obtained from a two-class solution. 



Finally, the results of the mixture model for the Level-2 predic- 
tors are illustrated in Figure 9. As can be inferred from Figure 9, 
the distribution of the latent variables was slightly non-normal. 
In this empirical example, the means of the latent variables in the 
two classes were marginally different (with means of about An ~ 
fi.21 ~ 1.9 in Class 1 and jlu ~ A22 ~ 2.1 in Class 2). Additional 
analyses may reveal the necessity to increase or decrease the num- 
ber of latent classes (e.g., using the DIC). Here, the DIC was 
14,780 for a model including the mixtures and 14,770 for a model 
without the mixture distribution. This indicates that a mixture 
may not have been necessary in this case. 

7. DISCUSSION 

In this article, we presented a generalized non-linear multilevel 
structural equation mixture model (GNM-SEMM) framework. 
A key characteristic its ability to specify non-linear functional 
relationships between outcome variables on one side and latent 
predictors or manifest covariates on the other side by using 
semiparametric regression functions (e.g., splines; Freund and 
Hoppe, 2007; Hastie et al., 2009). This feature is given for 
both levels, the within and between (cluster) levels of nested 
data structures. Given that (multilevel) latent variable modeling 
frameworks are typically linear (Bollen, 1989; van der Linden 
and Hambleton, 1997; Rabe-Hesketh et al., 2004; Muthen and 
Asparouhov, 2011), the relaxation of the linearity assumption 
is a step forward toward a more realistic approximation of a 
non-linear world. It thus extends the hitherto available multilevel 
modeling frameworks. 

A second key characteristic is the ability to specify latent mix- 
ture distributions on both levels. As in recent semiparametric 
latent variables approaches (e.g., Bauer and Curran, 2004; Bauer, 
2005; Kelava et al., 2014), this allows for an approximation 
of non-normally distributed latent predictor variables for a 



thorough introduction with regard to manifest variables, see 
McLachlan and Peel (2000). Again, the relaxation of a typical 
assumption that can be found in most applications of latent vari- 
able modeling allows for a more precise modeling of relationships 
for heterogeneous populations or distributions. 

A third key characteristic of the proposed approach is that 
it is flexible enough to specify a large number of special cases. 
For example, it offers the ability to approximate a non-normal 
distribution using mixture modeling and provides an easy way 
to interpret the parametric functional form of the latent vari- 
able relationship. As another example, it is possible to specify 
a non-linear latent variable relationship in one subpopulation 
but not in the other. The same is true for different levels. If 
functional forms of the relationships are unknown, semiparamet- 
ric approximations of these relationships are also possible using 
mixtures. 

Taken together, these properties are desirable. Nevertheless, 
the identification and estimation of the models is a general issue. 
Additional assumptions have to be introduced as was exem- 
plified in the sections before (see Level- 1 section on the mea- 
surement model). Fortunately, these assumptions are standard 
identification assumptions in latent mixture, latent (non)linear, 
and (semi)parametric modeling, but researchers should be care- 
ful when specifying models. For example, multiple intercepts in 
spline models might lead to identification issues. However, the 
wide range of specifiable models offers a variety of adaptable 
estimators that could be applied from a theoretical standpoint. 
Bayesian MCMC, Newton-type algorithms, and adapted EM- 
Algorithms are just a few examples. 

In this paper, we also used a substantive example from edu- 
cational science. A complex model was applied to data from the 
large-scale PISA study (Organisation for Economic Co-Operation 
and Development, 2010) illustrating several conditions that may 
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occur in empirical data. First, an a priori unknown curvilinear 
relationship between the latent variables was identified on Level 1 
using a semiparametric latent spline model. Second, the proposed 
mixture part on Level 2 could be used to control for the potential 
non-normality of the latent Level-2 predictors. In this example, 
only a slight indication of non-normality was visible. The model 
may have also been extended to include a mixture model on 
Level 1. Third, on Level 2 a latent random intercept modeled a 
school-dependent math skill, which allowed us to account for the 
clustering of the data. The random intercept was predicted by a 
latent non-linear interaction model. The model may be extended 
further, for example, to test the linearity assumption on Level 2 of 
the relationship between the latent variables apart from the inter- 
action effect. Other random effects could also be included. In any 
case, the specification of these effects should be theory-driven. 

Finally, we want to mention two important considerations. 
The proposed model should be viewed as a general framework 
that includes a variety of different possible models. A model that 
includes all aspects as presented in the model section would be 
highly parameterized and may overfit the data. In each empir- 
ical situation, we recommend that the actual applied model be 
restricted to a simpler model that allows for an adequate but 
parsimonious representation of the data. A decision concerning 
the necessity to include different parts of the model depends 
on the hypothesized model (e.g., random factor loadings in a 
confirmatory factor model or a latent spline to predict a latent 
slope in the structural model) and on model comparisons. In 
the Bayesian framework, Bayesian indices/information criteria for 
model selection (e.g., the deviance information criterion, DIC: 
Spiegelhalter et al., 2002; Celeux et al., 2006; or the Bayes fac- 
tor, Bernardo and Smith, 1994) are the primary model fit indices, 
although they only allow only for a model comparison to be 
made, and they are not absolute fit indices. In general, for (both 
frequentist and Bayesian) non-linear models there are no absolute 
fit indices (Klein and Schermelleh-Engel, 2010). Hence, a top- 
down (or bottom-up) strategy using information criteria may be 
a viable way to improve the model (i.e., to restrict the model to its 
necessary parts). An illustration of such a strategy for multilevel 
models in general can be found, for example, in West et al. (2007). 

Furthermore, we did not show how to implement the pre- 
sented framework with statistical software. In this article, a 
Bayesian estimator was applied and implemented in OpenBugs, 
thus allowing us to analyze a complete but specific semiparamet- 
ric non-linear multilevel model. Future research should improve 
this implementation so that it will be feasibly available within 
standard statistical latent variable software (e.g., Mplus) that 
can be directly applied to different models by the substantive 
researcher. 
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